Load packages
suppressPackageStartupMessages(require(tidyverse))
suppressPackageStartupMessages(require(gapminder))
View first few rows of dataset
head(gapminder)
## # A tibble: 6 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
Inspect what are the continents
unique(gapminder$continent)
## [1] Asia Europe Africa Americas Oceania
## Levels: Africa Americas Asia Europe Oceania
Concrete information of the data before removing Oceania
str(gapminder)
## Classes 'tbl_df', 'tbl' and 'data.frame': 1704 obs. of 6 variables:
## $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ lifeExp : num 28.8 30.3 32 34 36.1 ...
## $ pop : int 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
## $ gdpPercap: num 779 821 853 836 740 ...
Concrete information of the data after removing Oceania
gapminder %>%
filter(continent != "Oceania") %>%
droplevels() %>%
str()
## Classes 'tbl_df', 'tbl' and 'data.frame': 1680 obs. of 6 variables:
## $ country : Factor w/ 140 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ continent: Factor w/ 4 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ lifeExp : num 28.8 30.3 32 34 36.1 ...
## $ pop : int 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
## $ gdpPercap: num 779 821 853 836 740 ...
We can tell from above that * Number of rows decreased from 1704 to 1680. * Number of countries decreased from 142 to 140. * Number of continents decreased from 5 to 4.
suppressPackageStartupMessages(require(forcats))
Now arrange factors in order. Compared with the the scatter plot later, this has no effect on the plot.
gapminder %>%
arrange(gdpPercap) %>%
ggplot(aes(log(gdpPercap),lifeExp,color=continent)) + geom_point()
There are two countries in Oceania, and they are Australia and Newzealand. It is also obvious that Oceania is removed.
gapminder %>%
filter(continent == "Oceania")
## # A tibble: 24 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Australia Oceania 1952 69.1 8691212 10040.
## 2 Australia Oceania 1957 70.3 9712569 10950.
## 3 Australia Oceania 1962 70.9 10794968 12217.
## 4 Australia Oceania 1967 71.1 11872264 14526.
## 5 Australia Oceania 1972 71.9 13177000 16789.
## 6 Australia Oceania 1977 73.5 14074100 18334.
## 7 Australia Oceania 1982 74.7 15184200 19477.
## 8 Australia Oceania 1987 76.3 16257249 21889.
## 9 Australia Oceania 1992 77.6 17481977 23425.
## 10 Australia Oceania 1997 78.8 18565243 26998.
## # ... with 14 more rows
Save mean lifeExp into mean_life_exp.csv
(mean_life_exp = gapminder %>%
group_by(country) %>%
summarise(mu = mean(lifeExp)))
## # A tibble: 142 x 2
## country mu
## <fct> <dbl>
## 1 Afghanistan 37.5
## 2 Albania 68.4
## 3 Algeria 59.0
## 4 Angola 37.9
## 5 Argentina 69.1
## 6 Australia 74.7
## 7 Austria 73.1
## 8 Bahrain 65.6
## 9 Bangladesh 49.8
## 10 Belgium 73.6
## # ... with 132 more rows
write_csv(mean_life_exp, "mean_life_exp.csv")
Load package plotly
suppressPackageStartupMessages(require(plotly))
A scatter-plot of lifeExp vs log(gdpPercap) colored with continent
(p = ggplot(gapminder, aes(log(gdpPercap), lifeExp, color=continent)) + geom_point())
Here is the same plot in plotly. This allows each data to be inspected in details in its values and corresponding continent. This is especially helpful in this case since data are clumped together.
plot_ly(gapminder, x=~log(gdpPercap), y=~lifeExp, color=~continent)
## No trace type specified:
## Based on info supplied, a 'scatter' trace seems appropriate.
## Read more about this trace type -> https://plot.ly/r/reference/#scatter
## No scatter mode specifed:
## Setting the mode to markers
## Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
Save the above scatter-plot to file
ggsave("scatter_plot.png", plot=p)
## Saving 7 x 5 in image
Specifying plot=p matters when you made mulitple plots and you just want to save one of them
p0 = ggplot(gapminder, aes(lifeExp)) + geom_histogram()
ggsave("scatter_plot0.png", plot=p)
## Saving 7 x 5 in image
ggsave("hist0.png", plot=p0)
## Saving 7 x 5 in image
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Here we embed the scatter plot into this document